{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import gradio as gr\n",
    "from load_quantization import load_int\n",
    "\n",
    "import numpy\n",
    "\n",
    "# Model path\n",
    "tokenizer, model = load_int('/home/leosher/models/DoctorGLM/DoctorGLM-6B-INT4-6merge.pt',4)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def f(question_submit,max_length_submit,top_p_submit, temperature_submit,repetition_penalty_submit):\n",
    "    response, _ = model.chat(tokenizer,\n",
    "                               query=question_submit,\n",
    "                               history=[],\n",
    "                               max_length=max_length_submit,\n",
    "                               top_p=top_p_submit,\n",
    "                               temperature=temperature_submit,\n",
    "                               repetition_penalty=float(repetition_penalty_submit))\n",
    "    return(response)\n",
    "logo_URL = \"https://raw.githubusercontent.com/xionghonglin/DoctorGLM/main/imgs/logo.png\" # Logo\n",
    "image = \"<center> <img src= {} width=200px></center>\".format(logo_URL)\n",
    "\n",
    "\n",
    "question=gr.Textbox(placeholder='Please describe your disease', label='Descriptions')\n",
    "answer=gr.Textbox(label='DoctorGLM suggestion')\n",
    "max_length=gr.Slider(1, 2048, value=2048, label=\"Max length\", info=\"Choose the max length when generating the answer (2048 is suggested)\")\n",
    "top_p=gr.Slider(0, 1, value=0.7, label=\"Top-p\", info=\"Keep top-p proabilities token (0.7 is suggested)\")\n",
    "temperature=gr.Slider(0, 1, value=0.95, label=\"Top-p\", info=\"Control the randomness and creativity of generated text (0.95 is suggested)\")\n",
    "repetition_penalty=gr.Slider(1, 50, value=10.0, label=\"repetition_penalty\", info=\"Penalty on generating repetition words\")\n",
    "\n",
    "sharing=False # If you want others not in the same local net with you to visit your application, please set this to `True`` \n",
    "\n",
    "gui=gr.Interface(fn=f,inputs=[question,max_length,top_p, temperature,repetition_penalty],outputs=answer,title='DoctorGLM Online Interaction (Version: 0.01 alpha)',description = image,\n",
    "article=\n",
    "'''# Examples:\\n \n",
    "## Description \\n\n",
    "我腹痛，怎么办？\n",
    "## DoctorGLM suggestion\\n\n",
    "腹痛的病因很多.如腹腔内器官疾病(慢性胃病、肠梗阻等)。胃结石胃肠道感染及应用抗生素等情况均有可能导致肠胃疼痛；肠道寄生虫，病钩虫病和大便有可能出现排便时的刺痛感，及肛门周围痛胃肠炎也会有腹泻次数增多和大便排出时有不适的感觉等症状。建议到正规医院仔细检查明确病因后再治疗以免误诊误机的发生\\n\n",
    "## Description\\n\n",
    "我爷爷高血压可以喝咖啡吗？\n",
    "## DoctorGLM suggestion\\n\n",
    "你好，咖啡会导致体内咖啡因过量兴奋神经系统引起情绪激动头痛失眠等不适症状建议高血压患者不宜喝太多咖啡和饮料\\n\n",
    "# Technical Limitations:\n",
    "\n",
    "This work is in a very early stage and contains numerous mistakes, making it unsuitable for any commercial or clinical use.\n",
    "One of the reasons we have published our work is to invite the broader community to help improve this healthcare-focused language model, with the aim of making it more accessible, affordable, and convenient for a larger audience. Below are some critical technical issues we encountered during this project:\n",
    "\n",
    "1. DoctorGLM experiences a loss in capability during logistic training, and it occasionally repeats itself. We suspect that fine-tuning typically incurs a higher alignment cost compared to reinforcement learning with human feedback (RLHF).\n",
    "2. Generating a response takes approximately 15 to 50 seconds, depending on token length, which is significantly slower than interacting with ChatGPT via the web API. This delay is partly due to the chatbot's typing indicator.\n",
    "3. We are currently facing difficulties in quantizing this model. While ChatGLM runs satisfactorily on INT-4 (using about 6G), the trained LoRA of DoctorGLM appears to have some issues. As a result, we are currently unable to deploy our model on more affordable GPUs, such as the RTX 3060 and RTX 2080.\n",
    "4. We have noticed that the model's performance declines with prolonged training, but we currently lack a strategy for determining when to stop training. It appears that cross-entropy is an overly rigid constraint when fine-tuning LLMs.\n",
    "''').launch(share=sharing)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8.8 ('base')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "eaeac412cb94074bb77ccb7d42ca21d1d210903378c2fcd94db5ea1fbda226dc"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
